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Abstract  Sparse-representation-based  classification 
(SRC),  which  classifies  data  based  on  the  sparse  recon¬ 
struction  error,  has  been  a  new  technique  in  pattern  rec¬ 
ognition.  However,  the  computation  cost  for  sparse  coding 
is  heavy  in  real  applications.  In  this  paper,  various 
dimension  reduction  methods  are  studied  in  the  context  of 
SRC  to  improve  classification  accuracy  as  well  as  reduce 
computational  cost.  A  feature  extraction  method,  i.e., 
principal  component  analysis,  and  feature  selection  meth¬ 
ods,  i.e.,  Laplacian  score  and  Pearson  correlation  coeffi¬ 
cient,  are  applied  to  the  data  preparation  step  to  preserve 
the  structure  of  data  in  the  lower-dimensional  space. 
Classification  performance  of  SRC  with  structure-preserv¬ 
ing  dimension  reduction  (SRC-SPDR)  is  compared  to 
classical  classifiers  such  as  k-nearest  neighbors  and  support 
vector  machines.  Experimental  tests  with  the  UCI  and  face 
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data  sets  demonstrate  that  SRC-SPDR  is  effective  with 
relatively  low  computation  cost 
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Dimension  reduction  •  Structure  preserving 

Introduction 

In  recent  years,  sparse  representation  (or  sparse  coding)  has 
received  a  lot  of  attentions.  The  key  idea  is  to  search  for  the 
least  number  of  basis  vectors  (or  atoms)  in  a  dictionary 
A  €  Rmx"  to  characterize  a  signal  y  €  Wn  (A  has  n  atoms 
and  each  atom  is  a  vector  with  m  elements).  Therefore,  the 
signal  can  be  represented  as  the  sparse  vectors  x  €  R" 
based  on  atoms.  The  atoms  in  M'"  are  the  column  vectors  in 
A.  Sparse  representation  improves  performance  in  a  num¬ 
ber  of  applications  [64],  such  as  coding  [42],  classification 
[62],  image  denoising  [13],  smart  radio  [30,  31],  dimension 
reduction  [18,  60]  and  so  on. 

Sparse  coding  has  extensive  connections  to  biological- 
inspired  and  cognitive  approaches.  In  [42],  the  properties 
of  the  primary  visual  cortex  are  used  to  interpret  sparse 
linear  codes.  In  the  research  of  VI  simple  cell  receptive 
fields  [70],  the  sparse  coding  is  trained  using  biologically 
realistic  plasticity  rules.  In  [16],  sparse  coding  is  used  to 
explain  brain  function  in  primate  cortex.  In  [49], 
extracting  covariance  patterns  based  on  sparse  coding 
gives  a  promising  direction  in  cognitive  brain  region 
identification.  The  large-scale  brain  modeling  is  a  prom¬ 
ising  direction  in  cognitive  science.  In  [43],  the  applica¬ 
tion  of  sparse  coding  in  associative  memory  pattern  has 
been  pointed  out,  which  could  contribute  to  the  compli¬ 
cated  brain  modeling.  Recently,  structured  sparse  coding 
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has  been  proposed  based  on  neocortical  representations 
[23], 

In  this  paper,  we  focused  on  classification  based  on 
sparse  representation  in  low  dimension.  The  work  is  par¬ 
tially  inspired  by  a  sparse-representation-based  classifica¬ 
tion  (SRC)  method  recently  proposed  in  [56],  which 
searches  for  the  training  samples  producing  the  minimum 
reconstruction  error  of  testing  data.  Results  reported  in  [56] 
were  very  promising  and  competitive  to  those  from  tradi¬ 
tional  classification  methods,  such  as  support  vector 
machine  (SVM)  and  k-nearest  neighbors  (KNN). 

It  is  well  known  that  sparse  representation  methods  are 
computationally  intensive.  The  number  and  dimension  of 
atoms  in  a  dictionary  affect  computation  cost  significantly. 
In  the  community,  there  are  three  techniques  to  reduce  the 
computational  complexity  of  sparse  coding: 

•  Structure-preserving  dimension  reduction  (SPDR):  The 
purpose  is  to  reduce  redundancy  as  well  as  retain 
structure  in  the  data  preparation  process.  Many 
researchers  have  devoted  their  work  to  achieve  this 
goal  [4,  20,  24,  45,  66].  Various  classic  dimension 
reduction  methods  have  been  applied  to  sparse  coding. 
Sparse  latent  semantic  analysis  (sparse  LSA)  was 
proposed  in  [8],  the  sparsity  constraint  via  the  £\ 
regularization  was  added  in  the  formulation  of  the  LSA, 
which  is  a  popular  unsupervised  dimension  reduction 
tool.  Experimental  results  show  that  sparse  LSA  could 
be  effective  at  reducing  the  cost  of  projection  compu¬ 
tation  and  memory.  The  multi-label  sparse  coding 
framework  with  feature  extraction  [52]  was  applied  to 
automatic  image  annotation,  and  comparisons  with 
state-of-the-art  algorithms  demonstrated  its  efficiency. 
In  our  previous  work  [59],  we  significantly  extended 
definitions  for  the  sparse  representation  method  and 
investigate  its  analytical  characteristics  as  well  as 
empirical  results. 

•  Dictionary  construction:  The  key  to  successful  sparse 
coding  lies  in  the  dictionary.  There  are  two  main 
approaches  to  constructing  a  dictionary:  analytic  design 
and  dictionary  learning.  Analytic  design  establishes 
proper  atoms  from  abstract  function  spaces  [34]  or  pre¬ 
constructed  dictionaries,  such  as  wavelets  [40]  and 
contourlets  [5].  In  dictionary  learning  [36],  various 
technologies  such  as  regularization  and  clustering  are 
applied  on  training  data  to  build  dictionary.  In  [14],  the 
least-square  error  was  utilized  via  the  method  of 
optimal  directions  (MOD)  to  train  dictionary.  Online 
dictionary  learning  [39]  used  stochastic  approximations 
to  update  dictionary  with  a  large  data  set.  Laplacian 
score  dictionary  (LSD)  [58],  which  is  based  on  the 
geometric  local  structure  of  training  data,  selected  the 
atoms  for  the  dictionary. 


•  Efficient  optimization  algorithm:  Different  optimiza¬ 
tion  methods  are  embedded  the  sparse  coding  process 
to  improve  computational  efficiency.  A  convex  version 
of  sparse  coding  was  proposed  in  [3],  a  regularization 
function  via  compositional  norms  was  implemented  in 
convex  coding,  and  boosting-style  algorithm  was 
derived.  Experimental  results  in  the  image  denoising 
task  showed  the  advantages  of  the  boosted  coding 
algorithm.  In  efficient  sparse  coding  algorithms  [38],  t\ 
regularized  and  (2  constrained  least-squares  problem 
was  solved  iteratively,  and  its  applications  on  image 
process  showed  the  significant  acceleration  for  sparse 
coding.  In  [22],  a  nonlinear  feed-forward  predictor  was 
trained  to  produce  the  sparse  code,  and  the  proposed 
method  required  10  times  less  computation  cost  than 
previous  competitors. 

In  this  paper,  we  present  a  combined  SRC  and  SPDR 
framework.  Dimension  reduction  can  effectively  reduce  the 
computation  cost  and  extract  useful  structural  information. 
It  can  also  contribute  to  improved  performance  recognition 
tasks:  (i)  Discriminative  learning  for  dimensionality 
reduction  was  proposed  in  [37].  A  supervised  form  of  latent 
dirichlet  allocation  (LDA)  was  derived.  The  class  label 
information  was  incorporated  into  LDA,  which  enabled  the 
discriminative  application  of  LDA.  (ii)  A  five-step  proce¬ 
dure,  which  increased  different  dimension  reduction 
methods  with  classification,  is  proposed  in  [11].  In  partic¬ 
ular,  partial  least  squares  (PLS),  sliced  inverse  regression 
(SIR)  and  principal  component  analysis  (PCA)  were 
compared  in  terms  of  classification  performance  with  gene 
expression  data  sets. 

Similarly  in  our  work,  four  dimension  reduction  meth¬ 
ods,  i.e.,  PCA,  Laplacian  score  (abbreviated  as  LAP), 
Pearson  correlation  coefficient  (abbreviated  as  COR)  and 
minimum-redundancy  maximum-relevancy  (abbreviated  as 
mRMR)  [45]  are  studied  in  the  SRC  framework,  and 
extensive  experiments  in  comparison  with  other  classic 
classifiers  (SVM  and  KNN)  are  carried  out. 

The  contributions  of  this  paper  can  be  summarized  as 
follows: 

•  A  comprehensive  study  of  various  SPDR  methods  in 
sparse  representation  is  presented.  In  particular,  the 
performance  of  feature  extraction  and  feature  selection 
methods  are  examined. 

•  The  proposed  methods  are  successfully  applied  to  both 
the  UCI  data  sets  and  face  image  data  sets.  While  most 
sparse  coding  work  has  concentrated  on  natural  signal 
and  image  data  sets,  very  few  have  applied  sparse 
coding  to  the  feature  space  data  sets  (UCI  data  sets). 

•  Very  competitive  classification  results  are  obtained  on 
both  UCI  data  sets  and  face  data  set,  providing  new 
insight  to  the  capabilities  of  sparse  representation. 
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The  rest  of  the  paper  is  organized  as  follows:  Sect.  2 
reviews  related  SPDR  methods.  Section  3  presents  our 
proposed  SRC  based  on  SPDR.  Section  4  presents  exper¬ 
imental  results  of  this  framework  on  the  UCI  data  sets  and 
face  data  sets.  Finally,  Sect.  5  gives  conclusions  and  dis¬ 
cusses  future  works. 


Classic  Dimension  Reduction  Methods  with  Structure 
Preserving 

Dimension  reduction  is  an  important  method  in  knowledge 
discovery  [21]  and  machine  learning  [57].  Structure-pre- 
serving  [47]  constraints  can  improve  dimension  reduction. 
Normally,  certain  compact  coordinates  are  obtained  by 
dimension  reduction  methods  to  preserve  special  properties 
of  the  input  data.  The  distances  properties  of  data  points 
were  preserved  by  the  multidimensional  scaling  [10].  The 
local  geometry  of  the  data  set  was  studied  by  nonlinear 
manifold  learning  [54].  Eigenvector-based  multivariate 
analysis  [51]  revealed  the  internal  structure  in  terms  of 
variance. 

Many  sparse-representation-based  dimension  reduction 
algorithms  have  been  developed  extensively  recently, 
including  elastic  net  [68],  sure  independence  screening  [15] 
and  the  Dantzig  selector  [7].  These  researches  are  normally 
focused  on  reducing  the  number  of  atoms  for  sparse  rep¬ 
resentation,  such  as  setting  certain  sparse  coefficients  to 
zeros.  For  example,  in  group  structure  sparsity  [32]  and 
tree  structure  sparsity  [35],  the  sparse  coefficients  were 
modified  based  on  this  prior  information.  However,  there  is 
little  work  on  exploring  the  relationship  between  lower- 
dimension  data  sets  and  sparse  representation. 

There  are  two  categories  for  dimension  reduction,  fea¬ 
ture  extraction  (such  as  PCA)  and  feature  selection.  PCA  is 
a  linear  transformation  that  best  represents  the  data  in  the 
least-squares  sense.  Any  signal  can  be  coarsely  recon¬ 
structed  as  a  linear  combination  of  principal  components. 
Sparse  PCA  [69]  was  proposed  based  on  lasso  constraints 
with  the  result  of  sparse  loading.  In  terms  of  feature 
selection  [67],  it  focused  on  searching  for  a  subset  of  fea¬ 
tures  from  the  original  feature  sets.  Some  feature  selection 
methods  [48,  61]  combined  with  sparse  representation  have 
been  shown  to  be  effectiveness. 

A  huge  volume  of  literature  is  devoted  to  projecting 
high-dimensional  data  to  a  lower  dimensional  space 
through  various  methods,  such  as:  locally  linear  embed¬ 
ding,  linear  discriminant  analysis,  PCA,  LAP  [29]  and 
COR  [24].  We  just  choose  four  of  them  to  combine  with 
SRC,  following  the  previous  work  [56]  that  SRC  is  not 
sensitive  to  a  particular  projection  method. 


PCA  Criteria  and  Eigenface 

PCA  was  first  used  on  face  recognition  by  Turk  and 
Pentland  [51],  which  is  now  known  as  eigenfaces.  Given  a 
training  set  of  face  images  Ii,h,  ■  ■  the  first  step  is  to 
represent  each  image  I j  with  a  vector  /’,,  and  then  subtract 
the  average  face  a  =  ]n  ,  T,  from  the  training  face 
image  vector  <[>,  =  T,  —  a. 

Next,  eigenvalues  and  eigenvectors  of  the  covariance 
matrix  C  can  be  obtained. 


Typically,  only  the  n  most  significant  eigenvalues  and  their 
corresponding  eigenvectors  are  calculated.  The  resulting 
eigenvectors  are  the  eigenfaces.  Each  test  image  will  be 
projected  onto  these  eigenvectors,  and  the  coefficient 
vectors  are  then  used  in  the  classification. 

PCA  is  a  popular  dimension  reduction  method,  which 
projects  the  data  in  the  direction  of  maximal  variances  to 
obtain  the  minimized  reconstruction  error.  Normally,  it  is  a 
linear  data  transformation  to  preserve  the  global  structure, 
but  kernel-based  PCA  could  be  applied  to  nonlinear 
problems.  A  PCA  based  method  has  successfully  applied  in 
identification  of  human  population  structure  [44] . 

Laplacian  Criteria  for  Dimension  Reduction 

The  Laplacian  method  preserves  local  geometrical  struc¬ 
tures  without  the  data  labels.  LAP  [29]  is  a  new  feature 
selection  method  based  on  Laplacian  eigenmaps  and 
locality-preserving  projection.  The  score  evaluates  the 
feature’s  importance  according  to  its  locality-preserving 
ability. 

For  the  data  Y  =  [yx ,  y2, . . .,  y„]  with  feature  set 
F  =  [f  i ,  f2, . . .,  fm],  assume  Vr  is  the  LAP  for  the  rth  fea¬ 
ture  f,  ,  the  LAP  is  calculated  as: 

1.  It  first  constructs  a  nearest  neighbor  graph  G  with 
different  data  nodes  (y,  and  y;-,  ij  =  1 in  data 

l|y.— y,-||2 

sets.  Sjj  =  e  •  represents  the  score  between  data  y, 
and  y;-,  where  t  is  a  suitable  constant. 

2.  Then  Sr  can  be  defined  as 


where  D  =  diag(Sl),  1  =  [1, . . ..  l]r,L  =  D  —  S,  and 
f,  is  a  kind  of  normalization  via: 
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fr  =  f,  - 


ffofr 

1tD1 


(3) 


In  LAP,  the  nearest  neighbor  graph  based  on  the  data  points 
is  established,  and  the  local  structure  is  evaluated  by  the 
weights  between  nodes.  Therefore,  the  structure  in  the 
graph  preserves  the  discriminate  features  in  the  feature 
space. 


structure  between  the  features  and  class  centers.  For  the 
mRMR  method,  it  uses  mutual  information  to  build  the 
relations  between  the  features,  which  is  popular  and  robust 
in  many  applications,  the  detailed  settings  are  described  in 
[45], 


SRC  Based  on  Dimension  Reduction 


Pearson  Correlation  Coefficient  Criteria  and  mRMR 

Pearson  correlation  coefficient  is  based  on  the  covariance 
matrix  and  can  select  the  feature  variable  with  target  labels 
[24],  Normally,  it  is  a  supervised  feature  selection  method. 
The  COR  between  two  different  variables  is: 


In  sparse  representation,  assume  a  dictionary  with  a  set  of 
training  data  vectors  (or  atoms)  A=[a],..., 
a”1 , . . .,  a' , . . a"c],  where  A  S  R'”x",  c  is  class  label  for 
each  atom,  «,  is  the  number  of  atoms  associated  with  the 
category  i.  Then  a  new  test  data  vector  y  is  represented  in 
the  form: 


P(Xi,0lj) 


COv(oC;,  0Cj) 
^/var (a,)  x  var(ay) 


(4) 


According  to  the  max-dependency  and  min-redundancy 
[45]  concepts,  the  feature  selection  process  can  combine 
the  dependency  and  redundancy  criteria  together.  In  this 
work,  these  criteria  are  used  in  the  COR.  First,  max-rele¬ 
vance  criteria  are  applied  from  different  features  (f;  £  F)  to 
the  target  c  to  get  the  most  relevant  feature. 


maxD(F,  c),  D  =  P(th  c) 

I*  I  f,-sF 


(5) 


y  =  Ax  €  Rm  (9) 

where  x  =  [0, . . .,  0,  a,j ,  a,-^, . . a,-A.,  0, . . .,  0]r  €  R"  is  the 
sparse  vector  (coefficients).  In  order  to  calculate  x,  we  use 
the  l\  -regularized  least-squares  method  [33,  50]  defined  as: 

x  =  argmin{||y  -  Ax||j  +  AUxlU  (10) 

In  [56],  SRC  utilizes  the  representation  residual  to  predict 
class  labels  for  test  samples.  In  particular,  a  characteristic 
function  (<5,-  :  R"  — >  R")  is  defined  for  each  category  i, 
which  chooses  the  sparse  coefficients  via  the  category.  And 
the  classification  is  based  on: 


Then  for  min-redundancy  criteria,  the  selected  feature  is:  label(y)  =  argmin  r,(y),  r, (y)  =  ||y  —  A<5,-(x) ||2  (11) 


minR(F),  K  =  i  ^  f(fi,fi)  (6) 

1*1  f„f,eF 

In  order  to  combine  max-relevance  and  min-redundancy, 
an  operator  <P(D,R)  is  defined.  It  is  a  simple  form  of 
optimization  of  D  and  R. 


max  <P(D,R),  <p  =  D  —  R 


(7) 


In  the  process  of  incremental  feature  selection  [45],  sup¬ 
pose  we  have  chosen  {m  —  1}  features  with  the  feature  set 
Fm_j.  In  the  m  step,  it  selects  the  mth  feature  from  feature 
set  {F  —  Fm_  | }.  This  can  be  operated  from  fl>{D.  R).  The 
criteria  are: 


max 

f,e{F-Fm_,} 


7TT  E  *(f;-.f.O 

f;GF„,  . 


(8) 


Our  method  constructs  sub-feature  space  with  the  features 
most  connected  with  the  class  (category)  center  while  fil¬ 
tering  redundant  features,  which  are  criteria  of  dependence 


In  their  work,  some  related  dimension  reduction  methods 
with  SRC  are  combined  to  show  that  SRC  is  robust  with 
low-dimensional  features  from  images.  In  particular,  ran¬ 
dom  face,  downsampling  face  and  Fisher  face  are  used  as 
low-dimensional  features  for  SRC.  The  paper  claims  that 
the  choice  of  dimension  reduction  methods  does  not  sig¬ 
nificantly  impact  SRC  performance,  and  sufficient  dimen¬ 
sionality  (such  as  dimension  100  for  face  data)  of  the  data 
is  more  important  for  SRC. 

We  follow  this  direction  to  propose  the  SPDR  method, 
which  we  apply  on  the  UCI  and  image  data  sets.  In  details, 
a  dimension  reduction  projection  Ps{B)  is  applied  to  the 
input  data  Y  and  all  the  atoms  in  A,  where  subspaces  are 
denoted  by  S,  and  S(B)  means  the  subspace  spanned  by 
matrix  B,  the  dimension  of  the  data  would  be  changed  from 
dimension  m  to  dimension  cl  (m  >  d).  In  our  work,  the 
matrix  B  is  obtained  from  PCA,  LAP,  COR  and  mRMR. 
Algorithm  1  shows  in  detail  the  procedure  of  the  SRC 
method  with  SPDR.  In  classification,  a  function  if/  is  built 
from  a  training  set  (Y,-,c,),  i  =  The  goal  of 
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Fig.  1  Sparse-representation- 
based  classifier  is  applied  to  the 
“Libras  Movement”  data  set 
with  3  different  feature  selection 
methods.  For  each  case,  the 
upper  figure  shows  sparse 
coefficients  based  on  the 
corresponding  dictionaries  (the 
dots  denote  the  non-zero 
coefficients),  and  the  lower 
figure  shows  representation 
residuals  r,(y)  on  different 
categories 


The  result  of  sparse  coding  based  on  the  PCA  feature 
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Dictionary  atom  index 
Residual  on  different  categories  (PCA) 


1  2  3  4  5  6  7  8  9  10  11  12  13  14  15 

Category 

(a)  PCA  feature  SRC 


The  result  of  sparse  coding  based  on  the  LAP  feature  The  result  of  sparse  coding  based  on  the  COR  feature 


Residual  on  different  categories  (LAP)  Residual  on  different  categories  (COR) 


Category  Category 

(b)  LAP  feature  SRC  (c)  COR  feature  SRC 


dimension  reduction  for  classification  is  to  retrieve  sub¬ 
spaces  [9,  53]  which  are  the  most  relevant  to  the  classifi¬ 
cation,  noted  as  subspace  S(B)  such  that: 

<P(  Y)  =  <p(PS[B)  Y)  (12) 

the  decision  rule  q>  established  from  the  projected  data 
PSiBj  Y  should  be  the  same  as  that  established  from  the 
original  data  Y. 


A  case  study  of  this  algorithm  is  shown  in  Fig.  1.  The 
algorithm  is  run  over  the  “Libras  Movement”  data  set  from 
the  UCI  source  [17].  The  dimension  of  data  is  reduced  from 
original  90  to  20  via  PCA,  LAP  and  COR,  respectively.  SRC 
is  then  applied  to  a  test  vector  y  based  on  a  dictionary  con¬ 
taining  1 80  training  vectors.  The  SRC  on  test  data  is  shown  in 
Fig.  1.  In  particular,  the  sparse  coefficients  based  on  the 
dictionary  and  the  corresponding  representation  residuals  on 
different  classes  are  exhibited.  We  can  observe  that: 
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Algorithm  1  SRC-SPDR 

1:  Input:  a  set  of  training  data  A  £  wjth  c  classes,  test  data  y  6  Rm,  a  dimension 

reduction  matrix  B 

2:  Compute  new  data  A  =  -Ps(b)A  and  y  =  Ps(b)Y 
3:  Solve  ^-regularized  least  squares  problem: 

x  =  arg  min{||y  -  Ax||^  +  A||x|| i } 

4:  Compute  the  residuals: 

n(y)  =  ||y  -  A(5l(x)||2  for  i— 

5:  Output:  label( y)  =  arg  min  rRyj 


•  The  classification  performance  is  the  same  in  each  of 
these  three  cases,  and  the  dimension-reduced  data  are 
sufficient  for  SRC  to  make  judgments. 

•  The  coefficients  from  LAP  and  COR  are  sparser  than 
those  from  PCA. 

•  Category  4  has  the  second  smallest  residual,  so  it 
provides  an  indication  of  the  similarity  between 
category  4  and  category  7.  There  may  be  potential  for 
SRC  to  cluster  similar  categories. 

•  The  results  from  PCA  and  LAP  are  similar,  partially 
because  they  are  both  unsupervised  feature  selection 
methods.  COR,  which  is  a  supervised  feature  selection 
method,  produces  results  that  are  quite  different  from 
the  others. 


Complexity  Discussion 

Suppose  we  have  a  signal  f  £  it  can  be  decomposed 
into  an  orthonormal  basis  ¥  and  a  coefficient  vector  x, 
which  can  be  written  in  the  following  way: 

/  =  =  Wx  (l3) 

;=i 

The  vector  jr  is  N  x  1  dimension,  and  orthobasis  ¥  is  in 
R.NxN.  x  is  called  S-sparse  if  it  has  at  most  S  non-zero 
elements.  The  signal  /  is  composed  of  a  best  subset  of  S 
columns  that  span  in  an  orthogonal  basis  of  size  N  x  N.  It 
implies  that  the  choices  of  finding  out  this  particular  subset 


To  reconstruct  the  signal  /  from  a  linear  combination  of 
vectors,  we  would  like  to  constrain  the  error  to  be  smaller 
than  a  fixed  approximation  error  as  well  as  only  keep  the 
least  number  of  vectors  that  are  orthogonal  with  each  other. 
However  it  is  an  NP-hard  problem  [2].  In  past  literature, 
before  the  restricted  isometry  property  (RIP)  condition  was 
discovered,  the  matching  pursuit  method  was  developed  to 
solve  £q  norm  minimization  problems.  Here  £q  norm  is  a 
pseudo  norm  and  defined  as  ||x||f0— s.t.  x[i]  ^  0}. 


In  the  compressive  sensing  problems,  a  measurement 
space  is  introduced  where  there  is  an  observation  y  £  Rm, 
and  y  is  obtained  by  a  random  sampling  matrix  <P  £  RmxJV. 

y  =  (PVX  (14) 

Therefore,  the  orthogonal  transforming  space  ¥  is  mea¬ 
sured  by  the  sampling  matrix  <P.  Furthermore,  there  are  S 
vectors  selected  by  the  sparse  vector  x  from  the  matrix  <P¥, 
making  up  an  orthogonal  subspace  and  hence,  their 
linear  combinations  are  the  observation  of  the  original 
vector.  From  the  sampled  observation,  x,  the  random 
sensing  matrix  (P  can  be  found  under  the  fixed  space  '/', 
and  thereby  recover  the  original  signal  /. 

By  giving  theoretical  proof  in  [6],  that  the  RIP  condition 
holds,  £\  norm  minimization  can  reconstruct  the  signal  as 
well  as  the  ('o  norm  with  overwhelmingly  high  probability 
if  m  =  0(S\og(N /S)).  Furthermore,  l\  norm  is  a  convex 
optimization,  and  now  can  be  solved  via  LASSO  regression 
which  is  much  more  tractable  than  computing  the  £q  norm. 
Lasso  has  relatively  low  polynomial  computational  cost  of 
0(m2N)  time  [41]. 

Using  the  sparse  representation  to  reconstruct  a  signal 
relies  on  the  same  computation  framework  as  the  com¬ 
pressive  sensing,  in  the  way  of  sparsely  selecting  a  subset 
of  atoms  that  are  linearly  independent  with  each  other, 
from  the  over-complete  dictionary  A  £  RMxN(M<  <N). 

In  order  to  reduce  the  expensive  computation  cost  dur¬ 
ing  optimization,  we  propose  a  hierarchical  sparse  coding 
framework,  which  has  a  dictionary  with  fewer  dimensions 
but  without  compromising  the  performance  in  the  multi¬ 
label  classification  task.  On  one  hand,  the  dimension 
reduction  before  sparse  coding  classification  has  an  over¬ 
head  computation  cost.  On  the  other  hand,  it  reduces  the 
cost  in  the  sparse  coding  stage.  The  dimension  is  reduced 
in  the  space  of  Rm,  by  selecting  certain  higher  scored 
elements  along  the  columns  of  a  dictionary.  However  it  has 
a  power  to  cost  in  the  computation  of  Lasso  optimization 
0(rn2N).  In  our  work,  we  have  experimented  with  reducing 
m  to  different  levels,  not  only  to  lessen  the  computational 
cost.  In  doing  so,  we  discovered  that  with  only  a  very  small 
number  of  features,  we  can  preserve  the  structure  of  the 
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dictionary,  and  thereby  keeping  the  classification  perfor¬ 
mance  competitive. 

Experimental  Results 

In  this  section,  we  present  experimental  results  on  different 
data  sets  to  study  the  effectiveness  of  SRC  with  dimension 
reduction.  The  experiments  are  conducted  on  the  UCI  data 
sets  [17]  and  the  extended  Yale  face  database  B  [19]. 

Experimental  Setup 

In  these  experiments,  we  first  apply  three  dimension 
reduction  methods  to  transform  the  data  to  a  lower 
dimensional  space.  As  mentioned  before,  PC  A  is  a  linear 
data  transformation,  LAP  is  an  unsupervised  feature 
selection  method  and  COR  and  mRMR  are  supervised 
feature  selection  methods.  Then,  we  use  three  classification 
methods:  SRC,  SVM  and  KNN,  to  show  the  classification 
accuracy.  SVM  is  an  effective  and  popular  classifier  [46], 
which  uses  kernel  methods  to  construct  class  boundaries  in 
higher  dimensional  space.  KNN  is  a  classic  classifier  and 
has  achieved  good  performance  in  recent  studies  [12]. 

Each  data  set  is  randomly  partitioned  to  training  and 
testing  sets  at  a  1:1  ratio.  Each  experiment  is  carried  out  five 
times,  and  the  final  results  are  averaged.  In  the  SRC  method, 
the  entire  training  set  are  included  in  the  dictionary,  which  is 
the  same  setting  as  the  work  reported  in  [56,  63]. 

In  these  experiments,  the  sparse  coding  software  is  l\-ls 
package  [33]  from  Stanford  university,  SVM  and  KNN 
classifiers  are  from  the  Java  toolbox  [55],  and  parameters 
of  the  tools  are  set  to  the  default.  For  SVM  and  KNN,  the 
parameters  are  chosen  based  on  the  performance  over  the 
test  set.  In  detail  with  SVM,  we  used  three  kernels  (linear 
kernel,  polynomial  kernel  and  radial  basis  function  kernel), 
and  the  kernel  parameters  are  0.5  and  0.05.  There  are  six 
outputs  for  SVM  testing.  In  the  KNN  side,  the  number  of 
neighbors  “k”  is  set  as  1  or  5,  the  distances  we  have  used 
are  LI  distance,  L2  distance  and  cosine  distance.  There¬ 
fore,  we  also  get  six  results  for  KNN  over  the  test  set.  The 
final  parameters  are  those  yielded  higher  average  perfor¬ 
mance  over  the  test  set.  In  Fig.  2,  we  have  shown  a  case  for 
data  glass  with  PCA  dimension  reduction.  The  three  best 
SVM  and  KNN  results  are  listed. 

Experiments  on  UCI  Data  Sets 

Our  experiments  cover  five  benchmark  UCI  data  sets  [17]. 
Due  to  the  difficulty  of  multi-category  classification  prob¬ 
lems,  most  selected  data  sets  are  multi-category  data  sets 
(except  “Anneal”).  Table  1  shows  the  detailed  information 
of  the  experimental  data  sets. 


Classification  Comparsion  for  Data  Glass  based  PCA  dimension  reduction 


Fig.  2  SVM  and  KNN  with  different  parameters,  3  best  SVM  and  3 
best  KNN  performances  based  on  different  parameters  are  shown.  In 
the  final  comparison,  SVM-2  (with  polynomial  kernel  (0.5))  and  knn- 
2  (with  5  neighbors  and  cosine  distance)  are  chosen 


Table  1  UCI  data  sets 


Name 

Feature  number 

Total  size 

Test  size 

Class 

Wine 

13 

178 

89 

3 

Glass 

10 

214 

107 

7 

Libras  Movement 

90 

360 

180 

15 

Wine  Quality 

11 

4,898 

2,449 

6 

Anneal2 

11 

798 

399 

2 

a  The  missing  feature  has  been  removed 


In  these  experiments,  the  dimension  of  the  data  is 
changed  from  small  to  large  to  evaluate  the  effect  of 
number  of  features  on  classification  accuracy.  Then  SRC, 
SVM  and  KNN  are  applied  to  the  lower  dimensional  data 
to  obtain  the  classification  accuracies.  The  details  are 
shown  in  following  figures.  Figure  3  shows  the  results  for 
data  set  “Wine.”  With  PCA  and  COR,  SRC  and  SVM  have 
higher  accuracies  in  higher  dimensions.  With  LAP,  KNN 
performs  slightly  better  than  SVM.  In  mRMR  results,  KNN 
has  a  higher  accuracy  than  SVM  and  SRC. 

The  results  for  the  data  set  “Glass”  are  shown  in  Fig.  4. 
SRC  produces  consistently  higher  accuracies  in  all  three 
dimension  reduction  cases.  SRC  with  PCA  tends  to  be 
stable  from  dimension  3.  SRC  with  COR  reaches  more  than 
95  %  from  dimension  5  and  up,  which  is  much  higher  than 
the  results  of  SVM  with  COR  and  KNN  with  COR.  For 
mRMR  method,  SRC’s  performance  improves  dramati¬ 
cally  with  over  4  features.  On  this  particular  data  set,  our 
results  can  be  compared  with  the  results  presented  in  [1],  In 
their  work,  “Boost-NN”,  “Allwein”  and  “Naive  k-NN” 
were  applied  on  the  whole  data  set  with  size  of  214  for 
training,  and  the  achieved  classification  rates  were  75.6, 
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Fig.  3  Classification  result  for 
data  Wine 


PCA  LAP 


Fig.  4  Classification  result  for 
data  Glass 


COR  mRMR 


74.8  and  73.2  %,  respectively.  These  results  are  similar  to 
our  SVM  and  KNN  results  with  PCA  or  COR.  However,  in 
our  case,  a  smaller  dimension  and  half  of  data  set  are  used 
in  training.  SRC  with  PCA  or  COR  clearly  results  in  better 
performance  on  these  data  sets. 


In  Fig.  5,  SRC  results  show  the  obvious  higher  accura¬ 
cies  on  the  “Libras  Movement”  data  set.  In  PCA  subfigure, 
when  the  accuracies  of  SVM  and  KNN  deteriorate,  SRC’s 
accuracy  remains  at  its  original  level,  demonstrating  the 
ability  of  SRC  to  deal  with  noisy  data. 
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Fig.  5  Classification  result  for 
data  Libras  Movement 


PCA  LAP 


Fig.  6  Classification  result  for 
data  Wine  Quality 


PCA  LAP 


For  the  data  set  “Wine  Quality”  in  Fig.  6,  SRC  shows 
similar  performance  to  SVM  and  KNN  in  PCA  and  LAP 
features.  The  accuracies  increase  steadily  as  a  function  of 
increasing  feature  dimensions.  SRC  with  PCA  tends  to 
have  higher  accuracy  in  higher  dimensions  (from  dimen¬ 
sion  6  and  up).  In  the  case  of  COR,  SRC  with  COR  has  the 


worst  performance,  which  needs  to  be  further  investigated. 
For  mRMR,  SRC  has  a  stable  performance  from  dimension 
of  6. 

“Anneal”  in  Fig.  7  is  the  only  binary  data  set  in  our 
experiment.  SRC  performs  better  than  other  classifiers  in 
the  PCA  case,  and  SRC  shows  similar  results  with  KNN  in 
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Fig.  7  Classification  result  for 
data  Anneal 


COR 


mRMR 


Table  2  Comparisons  of  classification  accuracy  (%)  based  on  UCI  data  sets 


Data  set 

PCA 

LAP 

COR 

mRMR 

20  % 

50  % 

80  % 

20  % 

50  % 

80  % 

20  % 

50  % 

80  % 

20  % 

50  % 

80  % 

Wine-SVM 

70.79 

93.26 

92.13 

75.28 

86.52 

91.01 

47.19 

83.15 

92.13 

23.60 

93.26 

95.51 

Wine-KNN 

68.54 

88.76 

85.39 

81.90 

87.64 

88.76 

56.18 

76.40 

85.39 

60.67 

92.13 

93.26 

Wine-SRC 

70.79 

92.58 

92.65 

80.24 

92.13 

93.26 

53.93 

84.27 

91.01 

41.57 

80.90 

83.15 

Glass-SVM 

53.27 

78.50 

77.57 

33.64 

50.47 

70.09 

71.03 

72.90 

71.03 

67.29 

67.29 

68.22 

Glass-KNN 

51.40 

74.77 

76.64 

32.71 

48.60 

67.29 

69.16 

72.90 

73.83 

71.96 

69.16 

70.09 

Glass-SRC 

54.21 

94.39 

97.20 

45.79 

69.16 

91.59 

71.96 

97.20 

97.20 

73.83 

87.85 

91.59 

Libras-SVM 

62.22 

56.67 

42.78 

68.33 

77.78 

75.56 

58.89 

69.44 

74.44 

58.33 

63.33 

67.22 

Libras-KNN 

61.72 

40.56 

37.78 

76.11 

78.33 

78.89 

66.11 

73.89 

77.78 

69.44 

77.22 

78.89 

Libras-SRC 

81.00 

81.00 

81.00 

80.00 

80.00 

79.44 

79.44 

75.56 

82.78 

74.44 

79.44 

80.00 

Wquality-SVM 

49.75 

53.38 

57.00 

42.75 

55.88 

56.50 

54.75 

56.00 

55.25 

54.00 

55.13 

54.25 

Wquality-KNN 

44.13 

48.63 

52.25 

49.88 

53.25 

52.13 

52.63 

55.75 

51.13 

51.00 

54.88 

54.38 

Wquality-SRC 

48.50 

55.13 

58.75 

45.50 

58.63 

57.38 

50.00 

59.25 

58.13 

49.75 

51.38 

56.75 

Anneal-SVM 

75.86 

75.86 

75.86 

75.37 

76.11 

75.86 

75.86 

76.11 

75.86 

75.86 

75.12 

76.11 

Anneal-KNN 

75.86 

76.60 

76.85 

82.02 

74.88 

76.35 

79.06 

80.05 

75.86 

77.59 

83.99 

75.86 

Anneal-SRC 

75.62 

76.35 

77.09 

81.53 

76.60 

77.09 

80.05 

78.08 

77.09 

75.86 

79.06 

77.09 

LAP  and  COR  cases.  SVM  is  stable  on  this  data  set  but  the 
accuracy  is  not  competitive.  KNN  is  has  advantages  in 
mRMR  case. 

Table  2  is  a  comprehensive  list  of  results  over  the  UCI 
data  sets.  The  highest  accuracies  are  highlighted  among 
SVM,  KNN  and  SRC.  The  outputs  of  SRC  have  enhanced 
performance  in  most  cases.  Table  3  shows  the  standard 
deviation  based  on  the  accuracy  from  50  to  80  %.  Note  the 
small  standard  deviation  for  SRC,  highlighting  the  stability 
of  SRC  compared  with  SVM  and  KNN. 


Experiments  on  Face  Recognition 

The  extended  Yale  face  database  B  [19]  is  used  in  the 
second  experiment.  In  this  data  set,  there  are  2,414  faces 
images  from  38  people,  which  are  captured  in  different 
environments.  Each  face  image  is  54  x  48  pixels  large. 
Inspired  by  recent  work  [63]  using  Gabor  features  for  face 
recognition,  the  experiment  is  conducted  to  investigate 
SRC-SPDR  framework  on  Gabor  features.  A  set  of  Gabor 
filters,  which  contains  5  scale  levels  and  8  orientations,  are 
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Table  3  Standard  deviation  of  accuracy 


Different  performance 

PCA 

LAP 

COR 

mRMR 

Wine-SVM 

0.020 

0.017 

0.024 

0.018 

Wine-KNN 

0.021 

0.017 

0.049 

0.008 

Wine-SRC 

0.014 

0.018 

0.016 

0.012 

Glass-SVM 

0.0047 

0.11 

0.0093 

0.013 

Glass-KNN 

0.0089 

0.10 

0.0090 

0.025 

Glass-SRC 

0 

0.018 

0 

0.048 

Libras-SVM 

0.062 

0.010 

0.028 

0.023 

Libras-KNN 

0.032 

0.007 

0.021 

0.008 

Libras-SRC 

0 

0.003 

0.032 

0.005 

Wquality-SVM 

0.017 

0.009 

0.003 

0.011 

Wquality-KNN 

0.012 

0.008 

0.020 

0.012 

Wquality-SRC 

0.008 

0.002 

0.017 

0.008 

Anneal-SVM 

0 

0.003 

0.003 

0.004 

Anneal-KNN 

0.004 

0.006 

0.019 

0.038 

Anneal-SRC 

0.003 

0.003 

0.005 

0.009 

applied  to  each  face  image  with  the  same  parameters  as  in 
[63],  In  total,  there  are  40  Gabor  filters,  and  each  Gabor- 
face  is  with  the  size  of  6  x  6.  An  example  of  Gaborfaces  is 
shown  in  the  left  of  Fig.  8.  Then,  the  Gabor  features  with 
1,440  dimensions  are  used  to  perform  similar  experiments 
as  the  one  described  in  Sect.  5.1. 

In  Fig.  8,  we  also  evaluate  selecting  100-800  features 
by  the  LAP  and  COR  methods.  From  this  figure,  one  can 
see  that  the  100  and  200  features  selected  by  these  two 
methods  are  quite  different.  However,  the  accuracy  of  the 
100  LAP  and  COR-selected  features  is  similar  under  all 
three  classifiers,  as  shown  in  Fig.  9.  There  are  also  obvious 


diversities  between  the  selected  features  when  the  dimen¬ 
sions  are  400  and  800.  Hence,  we  may  conclude  that  Gabor 
features  have  a  lot  of  redundancy,  and  SPDR  is  necessary. 
In  Fig.  9,  SRC  shows  a  clear  performance  improvement  in 
the  case  of  LAP  and  COR.  Although  the  performance  of 
SRC  with  PCA  is  a  little  worse  than  SVM  with  PCA,  the 
accuracy  is  still  >  92  %  and  remain  stable  at  95  %  when 
the  feature  dimension  is  >  300. 

Figure  9  shows  the  classification  rates  for  number  of 
dimensions  ranging  from  100  to  1,000.  In  particular,  it  is 
interesting  to  investigate  the  classification  performance  at 
very  low  dimensions.  In  Table  4,  classification  results  for 
lower  dimension  (under  100)  of  Gabor  features  are  listed. 
The  number  of  dimensions  selected  varies  from  10  to  100, 
out  of  the  original  1,440.  In  the  cases  of  COR,  LAP  and 
mRMR,  SRC  always  obtain  higher  accuracy  than  SVM  and 
KNN.  With  PCA  method,  the  SRC  accuracies  are  the 
highest  when  the  dimension  is  smaller  than  80. 

In  addition  to  Gabor  features,  we  also  attempted  to  study 
the  performance  of  SRC-SPDR  on  the  original  image  pixel 
values.  The  original  pixel  number  is  54  x  48  =  2,592, 
which  is  too  large  for  SRC  and  dimensionality  reduction  is 
necessary.  In  this  experiment,  three  dimension  reduction 
methods  are  applied  on  the  images  to  reach  dimensions 
from  10  to  100.  Then  SRC  is  applied  to  the  dimension- 
reduced  vector  to  generate  the  classification  results  in 
Table  5.  The  results  are  compared  with  results  from  SRC 
using  Gabor-PCA  features,  SRC  using  random-face  fea¬ 
tures  [56]  and  SRC  using  downsample-face  features  [56]. 
In  order  to  achieve  the  dimension  from  10  to  100,  the 
downsample  process  is  carried  out  with  the  ratios  1/260, 
1/130,  1/87,  1/65,  1/52,  1/43,  1/37,  1/32,  1/29  and  1/26, 


Fig.  8  Face  image  process,  the 
Gabor  features  selected  with 
LAP  and  COR  are  shown 
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Fig.  9  Classification  results  for 
high-dimensional  Gabor  face 
features 


PCA  LAP 


COR  mRMR 


Table  4  Classification  accuracy 
(%)  based  on  Gabor  features 

Dimension  (d)  10 

20 

30 

40 

50 

60 

70 

80 

90 

100 

PCA-SVM 

29.22 

72.58 

83.84 

87.99 

89.31 

90.14 

91.23 

91.72 

92.30 

92.63 

PCA-KNN 

26.57 

65.65 

80.13 

83.61 

87.09 

88.41 

89.98 

90.65 

91.14 

90.89 

PCA-SRC 

36.42 

73.84 

84.35 

88.49 

90.23 

91.23 

91.31 

92.05 

92.30 

92.30 

LAP-SVM 

39.40 

59.35 

67.38 

74.34 

77.73 

79.72 

80.71 

82.78 

84.77 

85.76 

LAP-KNN 

34.19 

50.99 

59.19 

64.74 

67.05 

68.21 

68.79 

70.03 

72.27 

73.51 

LAP-SRC 

38.74 

66.47 

77.07 

82.04 

83.69 

85.68 

86.09 

88.08 

88.49 

88.49 

COR-SVM 

42.96 

60.51 

70.20 

77.90 

79.72 

81.54 

83.53 

84.69 

85.60 

86.75 

COR-KNN 

39.32 

52.40 

62.50 

68.29 

70.86 

71.11 

72.76 

75.66 

75.66 

76.82 

COR-SRC 

42.38 

64.49 

77.40 

83.36 

85.76 

88.00 

89.32 

89.90 

89.74 

90.65 

mRMR-SVM 

12.91 

39.82 

58.03 

65.81 

70.53 

73.01 

77.73 

78.56 

81.13 

81.54 

The  highest  accuracy  among  all 

mRMR-KNN 

46.27 

63.41 

67.05 

71.61 

73.18 

76.57 

77.57 

76.41 

77.07 

78.39 

the  methods  (columns)  are 
highlighted  in  bold 

mRMR-SRC 

48.76 

69.87 

77.48 

82.78 

84.85 

88.00 

88.33 

87.83 

88.91 

89.74 

Table  5  SRC  classification 
accuracy  (%)  on  different 
feature  sets 


The  actual  dimension  for  *,  A 
and  V  are  61,  71  and  81  due  to 
the  downsample  process 


Dimension  (d) 

10 

20 

30 

40 

50 

60 

70 

80 

90 

100 

Gabor-PCA 

36.42 

73.84 

84.35 

88.49 

90.23 

91.23 

91.31 

92.05 

92.30 

92.30 

Pixel-PCA 

48.34 

83.86 

91.06 

92.55 

94.45 

94.87 

95.12 

95.36 

95.94 

95.53 

Gabor-LAP 

38.74 

66.47 

77.07 

82.04 

83.69 

85.68 

86.09 

88.08 

88.49 

88.49 

Pixel-LAP 

43.79 

71.11 

80.13 

83.11 

85.92 

87.67 

87.42 

89.65 

90.48 

91.14 

Gobor-COR 

42.38 

64.49 

77.40 

83.36 

85.76 

88.00 

89.32 

89.90 

89.74 

90.65 

Pixel-COR 

52.57 

70.61 

78.06 

83.69 

85.26 

86.26 

87.75 

87.91 

88.49 

89.16 

Gobor-mRMR 

48.76 

69.87 

77.48 

82.78 

84.85 

88.00 

88.33 

87.83 

88.91 

89.74 

Pixel-mRMR 

50.66 

69.95 

79.06 

83.86 

86.42 

87.25 

89.24 

90.31 

90.07 

91.31 

Pixel-Random 

40.23 

64.07 

74.59 

81.21 

85.35 

87.25 

89.16 

90.56 

91.97 

92.63 

Pixel- 

Do  wnsample 

38.22 

63.25 

79.64 

82.62 

81.62 

89.65(*) 

91.64(A) 

92.64(V) 

92.80 

93.29 
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respectively.  The  actual  dimensions  are  shown  at  the  bot¬ 
tom  of  Table  5. 

The  highest  classification  rates  at  different  dimensions 
are  highlighted  in  Table  5.  It  is  interesting  to  note  that 
pixel-PCA  always  surpass  Gabor-PCA,  which  indicates 
that  SRC  with  dimension  reduction  works  better  in  the 
natural  signal  space  than  in  the  feature  space.  SRC  with 
pixel-PCA  features  at  dimension  20  can  reach  83.86  % 
accuracy.  Compared  with  the  results  proposed  in  [65]with 
the  same  experiment  settings  as  ours,  their  work  can 
achieve  81.5  %  in  the  dimension  of  56.  When  compared 
with  SRC  on  random  face  and  SRC  on  downsample  face 
[56],  SRC  with  pixel-PCA  performs  significantly  better, 
especially  at  very  low  dimensions  ( <  40).  It  is  important  to 
point  out  that  the  results  reported  in  [56]  were  based  on 
face  images  with  size  of  192  x  168,  while  our  results  are 
based  on  face  images  with  size  of  54  x  48. 

Conclusion 

A  comprehensive  study  is  conducted  on  a  variety  dimension 
reduction  methods  within  the  SRC  framework.  The  purpose 
is  to  use  DR  techniques  to  improve  the  sparse  coding  process, 
both  in  efficiency  and  accuracy.  Experiments  on  the  UCI  and 
face  data  demonstrate  the  effectiveness  of  this  combination. 
Particularly  in  data  Glass  and  data  Libras  Movement,  SRC  is 
able  to  obtain  around  20  and  30  %  classification  accuracy 
improvement  compared  to  SVM  and  KNN  at  lower  dimen¬ 
sions.  And  SRC  with  Pixel-PCA  feature  can  achieve  more 
than  90  %  accuracy  at  dimension  30  on  the  face  data  set. 
Based  on  the  results,  we  have  shown  both  experimentally  and 
theoretically  that  SRC  is  efficient  with  dimension  reduction 
methods.  Due  to  the  diversity  of  different  data  sets,  it  is  not 
clear  which  dimension  reduction  method  is  the  best  fit  for 
SRC,  which  is  similar  to  the  conclusion  in  the  previous  work 
[56],  However,  we  still  could  observe  that  PCA  +  SRC 
shows  more  advantages  compared  with  other  combinations, 
especially  in  the  face  data  set. 

There  are  many  interesting  future  research  topics  along 
this  direction.  For  instance,  with  the  continuous  of  the  big 
data  challenge,  how  to  integrate  the  sparse  representation 
with  complex  data  analysis  tasks  such  as  imbalanced  data 
[27],  dynamic  stream  data  [25,  26],  integrated  prediction 
and  optimization  [28],  among  others,  have  become  signif¬ 
icant  research  topics  in  the  society.  New  research  founda¬ 
tions,  principles  and  algorithms  are  needed  to  tackle  such 
challenges.  Furthermore,  large-scale  experimental  studies 
are  also  needed  to  fully  justify  the  effectiveness  of  the 
proposed  method.  Finally,  as  intelligent  data  analysis  is 
critical  in  many  real-world  applications,  how  to  bring  the 
proposed  techniques  to  a  wide  range  of  application 
domains  is  another  important  future  research  topic. 
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